-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finish migration of vars_funs
module for Python package
#32
Finish migration of vars_funs
module for Python package
#32
Conversation
… test it out" This reverts commit 9cc256b.
python/ccao/vars_funs.py
Outdated
human-readable format. For example, EXT_WALL = 2 will become | ||
EXT_WALL = "Frame + Masonry". Note that the values and their translations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: It's wrong in the R docs too, but EXT_WALL = 2
is "Masonry", not "Frame + Masonry".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks! Fixed in d135b9a.
python/ccao/vars_funs.py
Outdated
def vars_recode( | ||
data: pd.DataFrame, | ||
cols: list[str] | None = None, | ||
code_type: str = "long", | ||
as_factor: bool = True, | ||
dictionary: pd.DataFrame | None = None, | ||
) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to keep the interfaces across the R and Python versions of our major packages the same. Maybe we can rename the R function inputs here and then release a major version as we did with assesspy
. While we're at it we can trim out some of the unused functionality of the R version.
python/ccao/vars_funs.py
Outdated
if dictionary.empty: | ||
raise ValueError("dictionary must be a non-empty pandas DataFrame") | ||
|
||
required_columns = {"var_code", "var_value", "var_value_short"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (blocking): var_name
, var_type
, and var_data_type
should be required as well since they're used below.
Edit: Plus all the var_name_
columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I added stricter validation in f5ee577. Note that we check for any column matching the pattern var_name_*
, since technically the function doesn't care what the names are, or how many of them there are.
python/docs/source/reference.rst
Outdated
Dictionaries | ||
^^^^^^^^^^^^ | ||
|
||
Lookups for numeric codes used in the assessment system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Also used for vars_rename
, so not just for the numeric code lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, clarified in 2a82c96.
@@ -7,7 +7,7 @@ | |||
|
|||
# Load the default variable dictionary | |||
_data_path = importlib.resources.files(ccao.data) | |||
vars_dict = pd.read_csv(str(_data_path / "vars_dict.csv")) | |||
vars_dict = pd.read_csv(str(_data_path / "vars_dict.csv"), dtype=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (non-blocking): Won't this prevent matching to numeric values in the original data? i.e. 1
instead of "1"
.
{ | ||
"input": { | ||
"athena": { | ||
"name": "char_bsmt", | ||
"value": ["1", "3", "4", "5"], | ||
}, | ||
"iasworld": { | ||
"name": "bsmt", | ||
"value": ["1", "3", "4", "5"], | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (blocking): We should add a test for numeric values (and update the function to handle those) or make it clear to the user that vars_recode
expect all values to be strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking, although I'm going to wait to update anything until we decide on what the expected behavior should be in this thread.
This reverts commit 5637158.
Co-authored-by: Dan Snow <[email protected]>
@dfsnow This should be ready for another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this looks set to go now!
…e-spark-compatible Make the the Python package compatible with Athena PySpark
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #32 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 6 6
Lines 382 382
=========================================
Hits 382 382 ☔ View full report in Codecov by Sentry. |
This PR builds on #31 by defining a
vars_recode
function in the Python package, which completes the migration of thevars_funs
module to Python.